Notebook prepared by Clare Gibson.

Introduction

Goals

In this notebook I will:

  • practise optimizing \(\vec{w}\) and \(b\) using gradient descent
  • practise what I have learned in weeks 1 and 2 of the Stanford Machine Learning specialization
  • practise using R to implement a multiple linear regression model

Tools

In this notebook I use the following tools:

  • here for file location
  • tidyverse for data wrangling
  • ggplot2 for plotting
# Load packages
library(here)
library(tidyverse)
library(ggplot2)

Data

In this notebook I use the Kaggle Perth House Prices dataset. The chunk below calls a script that loads the data into R and prepares it for the linear regression exercise.

# Load data
source(here("R/utils.R"))

# Store data in variable df
df <- model_1

# Show the head
head(df)
## # A tibble: 6 × 4
##   price  size bedrooms   age
##   <dbl> <dbl>    <dbl> <dbl>
## 1   565   160        4    15
## 2   365   139        3     6
## 3   287    86        3    36
## 4   255    59        2    65
## 5   325   131        4    18
## 6   409   118        4    22

Problem statement

In this exercise I am working on predicting house prices using various features. The training dataset contains 29938 examples with three features (size, bedrooms and age). I will build a linear regression model using these values so that I can predict the price for other houses.

The code chunk below creates the x_train and y_train variables.

# Create x_train and y_train
x_train <- as.matrix(df[,-1])
y_train <- as.matrix(df[,1])

The example data is now stored in matrix x_train. Each row of the matrix represents one training example. When you have \(m\) training examples and \(n\) features, x_train is a matrix with dimensions \((m,n)\). In this example the matrix has dimensions 29938, 3.

dim(x_train)
## [1] 29938     3